47 research outputs found

    Fast and accurate semantic annotation of bioassays exploiting a hybrid of machine learning and user confirmation

    Get PDF
    Bioinformatics and computer aided drug design rely on the curation of a large number of protocols for biological assays that measure the ability of potential drugs to achieve a therapeutic effect. These assay protocols are generally published by scientists in the form of plain text, which needs to be more precisely annotated in order to be useful to software methods. We have developed a pragmatic approach to describing assays according to the semantic definitions of the BioAssay Ontology (BAO) project, using a hybrid of machine learning based on natural language processing, and a simplified user interface designed to help scientists curate their data with minimum effort. We have carried out this work based on the premise that pure machine learning is insufficiently accurate, and that expecting scientists to find the time to annotate their protocols manually is unrealistic. By combining these approaches, we have created an effective prototype for which annotation of bioassay text within the domain of the training set can be accomplished very quickly. Well-trained annotations require single-click user approval, while annotations from outside the training set domain can be identified using the search feature of a well-designed user interface, and subsequently used to improve the underlying models. By drastically reducing the time required for scientists to annotate their assays, we can realistically advocate for semantic annotation to become a standard part of the publication process. Once even a small proportion of the public body of bioassay data is marked up, bioinformatics researchers can begin to construct sophisticated and useful searching and analysis algorithms that will provide a diverse and powerful set of tools for drug discovery researchers

    BioAssay Ontology (BAO): a semantic description of bioassays and high-throughput screening results

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>High-throughput screening (HTS) is one of the main strategies to identify novel entry points for the development of small molecule chemical probes and drugs and is now commonly accessible to public sector research. Large amounts of data generated in HTS campaigns are submitted to public repositories such as PubChem, which is growing at an exponential rate. The diversity and quantity of available HTS assays and screening results pose enormous challenges to organizing, standardizing, integrating, and analyzing the datasets and thus to maximize the scientific and ultimately the public health impact of the huge investments made to implement public sector HTS capabilities. Novel approaches to organize, standardize and access HTS data are required to address these challenges.</p> <p>Results</p> <p>We developed the first ontology to describe HTS experiments and screening results using expressive description logic. The BioAssay Ontology (BAO) serves as a foundation for the standardization of HTS assays and data and as a semantic knowledge model. In this paper we show important examples of formalizing HTS domain knowledge and we point out the advantages of this approach. The ontology is available online at the NCBO bioportal <url>http://bioportal.bioontology.org/ontologies/44531</url>.</p> <p>Conclusions</p> <p>After a large manual curation effort, we loaded BAO-mapped data triples into a RDF database store and used a reasoner in several case studies to demonstrate the benefits of formalized domain knowledge representation in BAO. The examples illustrate semantic querying capabilities where BAO enables the retrieval of inferred search results that are relevant to a given query, but are not explicitly defined. BAO thus opens new functionality for annotating, querying, and analyzing HTS datasets and the potential for discovering new knowledge by means of inference.</p

    CLO: The cell line ontology

    Get PDF
    Abstract Background Cell lines have been widely used in biomedical research. The community-based Cell Line Ontology (CLO) is a member of the OBO Foundry library that covers the domain of cell lines. Since its publication two years ago, significant updates have been made, including new groups joining the CLO consortium, new cell line cells, upper level alignment with the Cell Ontology (CL) and the Ontology for Biomedical Investigation, and logical extensions. Construction and content Collaboration among the CLO, CL, and OBI has established consensus definitions of cell line-specific terms such as ‘cell line’, ‘cell line cell’, ‘cell line culturing’, and ‘mortal’ vs. ‘immortal cell line cell’. A cell line is a genetically stable cultured cell population that contains individual cell line cells. The hierarchical structure of the CLO is built based on the hierarchy of the in vivo cell types defined in CL and tissue types (from which cell line cells are derived) defined in the UBERON cross-species anatomy ontology. The new hierarchical structure makes it easier to browse, query, and perform automated classification. We have recently added classes representing more than 2,000 cell line cells from the RIKEN BRC Cell Bank to CLO. Overall, the CLO now contains ~38,000 classes of specific cell line cells derived from over 200 in vivo cell types from various organisms. Utility and discussion The CLO has been applied to different biomedical research studies. Example case studies include annotation and analysis of EBI ArrayExpress data, bioassays, and host-vaccine/pathogen interaction. CLO’s utility goes beyond a catalogue of cell line types. The alignment of the CLO with related ontologies combined with the use of ontological reasoners will support sophisticated inferencing to advance translational informatics development.http://deepblue.lib.umich.edu/bitstream/2027.42/109554/1/13326_2013_Article_185.pd

    High quality, small molecule-activity datasets for kinase research [version 3; referees: 2 approved]

    No full text
    Kinases regulate cell growth, movement, and death. Deregulated kinase activity is a frequent cause of disease. The therapeutic potential of kinase inhibitors has led to large amounts of published structure activity relationship (SAR) data. Bioactivity databases such as the Kinase Knowledgebase (KKB), WOMBAT, GOSTAR, and ChEMBL provide researchers with quantitative data characterizing the activity of compounds across many biological assays. The KKB, for example, contains over 1.8M kinase structure-activity data points reported in peer-reviewed journals and patents. In the spirit of fostering methods development and validation worldwide, we have extracted and have made available from the KKB 258K structure activity data points and 76K associated unique chemical structures across eight kinase targets. These data are freely available for download within this data note

    High quality, small molecule-activity datasets for kinase research [version 2; referees: 2 approved]

    No full text
    Kinases regulate cell growth, movement, and death. Deregulated kinase activity is a frequent cause of disease. The therapeutic potential of kinase inhibitors has led to large amounts of published structure activity relationship (SAR) data. Bioactivity databases such as the Kinase Knowledgebase (KKB), WOMBAT, GOSTAR, and ChEMBL provide researchers with quantitative data characterizing the activity of compounds across many biological assays. The KKB, for example, contains over 1.8M kinase structure-activity data points reported in peer-reviewed journals and patents. In the spirit of fostering methods development and validation worldwide, we have extracted and have made available from the KKB 258K structure activity data points and 76K associated unique chemical structures across eight kinase targets. These data are freely available for download within this data note

    Polypharmacology or Promiscuity? Structural Interactions of Resveratrol With Its Bandwagon of Targets

    Get PDF
    Resveratrol (3, 4′, 5-trihydroxy-trans-stilbene) is a natural phytoalexin found in grapes and has long been thought to be the answer to the “French Paradox.” There is no shortage of preclinical and clinical studies investigating the broad therapeutic activity of resveratrol. However, in spite of many comprehensive reviews published on the bioactivity of resveratrol, there has yet to be a report focused on the variety and complexity of its structural binding properties, and its multi-targeted role. An improved understanding of disease mechanisms at the systems level has enabled targeted polypharmacology to mature into a rational drug discovery approach. Unlike traditional hit-to-lead campaigns that typically optimize activity and selectivity for a single target, polypharmacological drugs aim to selectively target multiple proteins, while avoiding critical off target interactions. This strategy bears promise of improved efficacy and reduced clinical attrition. This review seeks to investigate whether the bioactivity of resveratrol is due to a polypharmacological effect or promiscuity of the phenolic small molecule by examining the modes of binding with its diverse collection of protein targets. We focused on annotated targets, identified via the ChEMBL database, and matched these targets to a representative structure deposited in the Protein Data Bank (PDB), as crystal structures are most informative in understanding modes of binding at the atomic level. We discuss the structural aspects of resveratrol itself that permits binding to multiple proteins in various signaling pathways. Furthermore, we suggest that resveratrol’s bioactivity is a result of scaffold promiscuity rather than polypharmacology, and the variety of binding modes across targets display little similarity in the pattern of target interaction

    Prospective Exploration of Synthetically Feasible, Medicinally Relevant Chemical Space

    No full text
    We describe a novel approach to direct the exploration of chemical space in an effort to balance synthetic accessibility and medicinal relevancy prior to experimental work. Reaction transforms containing empirical reactivity and compatibility information are dynamically assembled into reaction sequences (vProtocols) utilizing commercially available starting material feedstock. These vProtocols are evolved and optimized by a genetic algorithm, which leverages fitness functions based on predicted properties of generated molecular products. We present the underlying concepts, methodology and initial results of this prospective approach
    corecore